Data Exploration and Visualisation
What is Exploratory Data Analysis?
What is not Exploratory Data Analysis?
Issues around Exploratory Data Analysis.
Exploratory Data Analysis: quick and simple exerpts, summaries and plots to better understand a data set.
Iterative, not put into production
EDA notebooks can be helpful
Document and share what is often an ad-hoc process
Balance between reproducibility and time cose
An effective EDA sets a precedent for open communication with the stakeholder and project manager.
EDA is an initial assessment of whether the available data measure the correct values, in sufficient quality and quantity, to answer a particular question.
This requires:
EDA is not modelling.
EDA is not IDA.
EDA is not assumption free.
EDA is not prescriptive.
| after_june_98 | mean | sd |
|---|---|---|
| FALSE | 5.916798 | 65.19093 |
| TRUE | 3.972929 | 119.56067 |
Example: selecting null model.
Using information you wouldn’t have access to fit a model or construct a prior.
This “peeking” is often subtle or indirect making it hard to specify.
Train / test split or using EDA to select question / model of interest.
Corrections to testing estimation procedures:
Avoided by preregistration.
Humility and follow-up required in data science.
EDA is an important step in the life-cycle of a data science project.
An EDA can guide our project but risks data leakage issues.
EDA not often available publicly or written about in detail.
Learn from your own experience and explore lots of what other people do
Some starting points:
Effective Data Science: EDAV - Exploration - Zak Varty